Overview:

This page contains the results of CoNGA analyses. Results in tables may have been filtered to reduce redundancy, focus on the most important columns, and limit length; full tables should exist as OUTFILE_PREFIX*.tsv files.

Command:

/Users/Nick/conga/scripts/run_conga.py --gex_data merged_COVID_gex.h5ad --gex_data_type h5ad --clones_file merged_COVID_clones.tsv --organism human --graph_vs_graph --outfile_prefix ./CoNGA.output --no_kpca

Stats

num_cells_w_gex: 11282
num_features_start: 36601
num_cells_w_tcr: 10155
min_genes_per_cell: 200
max_genes_per_cell: 2500
max_percent_mito: 0.1
num_filt_max_genes_per_cell: 3463
num_filt_max_percent_mito: 114
num_TR_genes: 151
num_TR_genes_in_hvg_set: 92
num_highly_variable_genes: 1396
num_cells_after_filtering: 6578
num_clonotypes: 5453
max_clonotype_size: 135
num_singleton_clonotypes: 4864
nbr_frac_for_nndists: 0.01
num_gvg_hit_clonotypes: 80
num_gvg_hit_biclusters: 7

graph_vs_graph


Graph vs graph analysis looks for correlation between GEX and TCR space by finding statistically significant overlap between two similarity graphs, one defined by GEX similarity and one by TCR sequence similarity.

Overlap is defined one node (clonotype) at a time by looking for overlap between that node's neighbors in the GEX graph and its neighbors in the TCR graph. The null model is that the two neighbor sets are chosen independently at random.

CoNGA looks at two kinds of graphs: K nearest neighbor (KNN) graphs, where K = neighborhood size is specified as a fraction of the number of clonotypes (defaults for K are 0.01 and 0.1), and cluster graphs, where each clonotype is connected to all the other clonotypes in the same (GEX or TCR) cluster. Overlaps are computed 3 ways (GEX KNN vs TCR KNN, GEX KNN vs TCR cluster, and GEX cluster vs TCR KNN), for each of the K values (called nbr_fracs short for neighbor fractions).

Columns (depend slightly on whether hit is KNN v KNN or KNN v cluster): conga_score = P value for GEX/TCR overlap * number of clonotypes mait_fraction = fraction of the overlap made up of 'invariant' T cells num_neighbors* = size of neighborhood (K) cluster_size = size of cluster (for KNN v cluster graph overlaps) clone_index = 0-index of clonotype in adata object


conga_score num_neighbors_gex num_neighbors_tcr overlap overlap_corrected mait_fraction clone_index nbr_frac graph_overlap_type cluster_size gex_cluster tcr_cluster va ja cdr3a vb jb cdr3b
0.001559 NaN 54.0 14 12 0.000000 3634 0.01 gex_cluster_vs_tcr_nbr 205.0 9 5 TRAV35*01 TRAJ53*01 CAGRLSGGSNYKLTF TRBV11-2*01 TRBJ1-2*01 CASSLTGNYGYTF
0.003253 54.0 54.0 8 7 0.000000 3528 0.01 gex_nbr_vs_tcr_nbr NaN 2 5 TRAV35*01 TRAJ42*01 CAAVNYGGSQGNLIF TRBV5-1*01 TRBJ2-2*01 CASSPRTGGSTGELFF
0.007744 545.0 NaN 108 108 0.000000 4928 0.10 gex_nbr_vs_tcr_cluster 708.0 0 1 TRAV8-4*01 TRAJ54*01 CAVSDRQGAQKLVF TRBV4-1*01 TRBJ2-1*01 CASRSGWANEQFF
0.015643 545.0 545.0 87 87 0.011494 4966 0.10 gex_nbr_vs_tcr_nbr NaN 2 12 TRAV8-6*01 TRAJ13*01 CAVITSGGYQKVTF TRBV20-1*01 TRBJ1-6*01 CSARDRTESSYNSPLHF
0.023260 NaN 54.0 13 12 0.000000 3556 0.01 gex_cluster_vs_tcr_nbr 257.0 6 5 TRAV35*01 TRAJ42*01 CAGMNYGGSQGNLIF TRBV11-2*01 TRBJ1-2*01 CASSQREGTLYGYTF
0.029243 545.0 545.0 86 86 0.011628 2403 0.10 gex_nbr_vs_tcr_nbr NaN 5 2 TRAV24*01 TRAJ44*01 CAPGTASKLTF TRBV20-1*01 TRBJ1-1*01 CSAREQRDTMNTEAFF
0.051957 NaN 545.0 98 98 0.010204 4361 0.10 gex_cluster_vs_tcr_nbr 652.0 2 12 TRAV8-1*01 TRAJ12*01 CAVTPGADSSYKLIF TRBV20-1*01 TRBJ2-3*01 CSALGVAGMGDGTQYF
0.052461 NaN 54.0 17 16 0.000000 3463 0.01 gex_cluster_vs_tcr_nbr 493.0 5 5 TRAV35*01 TRAJ17*01 CAGQLYKAAGNKLTF TRBV19*01 TRBJ2-3*01 CASSQGGLGVHF
0.054839 NaN 54.0 14 10 0.000000 3532 0.01 gex_cluster_vs_tcr_nbr 204.0 9 5 TRAV35*01 TRAJ42*01 CAGKNYGGSQGNLIF TRBV7-3*01 TRBJ2-3*01 CASSLRGDTQYF
0.055734 54.0 54.0 7 6 0.000000 3533 0.01 gex_nbr_vs_tcr_nbr NaN 9 5 TRAV35*01 TRAJ42*01 CAGKNYGGSQGNLIF TRBV7-3*01 TRBJ1-2*01 CASSPGPGSPYGYTF
0.057226 NaN 54.0 14 10 0.000000 3623 0.01 gex_cluster_vs_tcr_nbr 205.0 9 5 TRAV35*01 TRAJ53*01 CAGLNSGGSNYKLTF TRBV6-4*01 TRBJ1-2*01 CASSARSGPLAGYTF
0.062457 545.0 NaN 79 73 0.000000 2856 0.10 gex_nbr_vs_tcr_cluster 461.0 5 5 TRAV27*01 TRAJ17*01 CAGAKAAGNKLTF TRBV7-2*01 TRBJ1-6*01 CASSLRTGGDNSPLHF
0.064224 545.0 NaN 105 104 0.000000 4586 0.10 gex_nbr_vs_tcr_cluster 708.0 0 1 TRAV8-3*01 TRAJ23*01 CVIINQGGKLIF TRBV24-1*01 TRBJ1-2*01 CATSKDRVYGYTF
0.066483 NaN 54.0 13 10 0.000000 3538 0.01 gex_cluster_vs_tcr_nbr 203.0 9 5 TRAV35*01 TRAJ42*01 CAGLNYGGSQGNLIF TRBV11-2*01 TRBJ1-2*01 CASSSRANGLNGYTF
0.069488 54.0 54.0 6 6 0.000000 3623 0.01 gex_nbr_vs_tcr_nbr NaN 9 5 TRAV35*01 TRAJ53*01 CAGLNSGGSNYKLTF TRBV6-4*01 TRBJ1-2*01 CASSARSGPLAGYTF
0.076880 NaN 54.0 12 10 0.000000 3574 0.01 gex_cluster_vs_tcr_nbr 201.0 9 5 TRAV35*01 TRAJ42*01 CAGQNYGGSQGNLIF TRBV6-2*01 TRBJ1-5*01 CASSYSQGQPQHF
0.078514 NaN 545.0 98 97 0.010204 2480 0.10 gex_cluster_vs_tcr_nbr 652.0 2 2 TRAV25*01 TRAJ38*01 CAGDNAGNNRKLIW TRBV20-1*01 TRBJ1-5*01 CSALNQGQYSNQPQHF
0.096380 545.0 NaN 28 28 0.000000 4963 0.10 gex_nbr_vs_tcr_cluster 122.0 2 12 TRAV8-6*01 TRAJ11*01 CAVSLGPSGYSTLTF TRBV20-1*01 TRBJ2-3*01 CSAIDRGQGDTQYF
0.138730 NaN 54.0 12 11 0.000000 3571 0.01 gex_cluster_vs_tcr_nbr 256.0 6 5 TRAV35*01 TRAJ42*01 CAGQNYGGSQGNLIF TRBV5-5*01 TRBJ2-1*01 CASSPRLAGSSYNEQFF
0.138730 NaN 54.0 12 11 0.000000 3577 0.01 gex_cluster_vs_tcr_nbr 256.0 6 5 TRAV35*01 TRAJ42*01 CAGQNYGGSQGNLIF TRBV7-8*01 TRBJ1-2*01 CASSPRQGAINGYTF
0.171267 NaN 54.0 11 11 0.000000 3547 0.01 gex_cluster_vs_tcr_nbr 256.0 6 5 TRAV35*01 TRAJ42*01 CAGLNYGGSQGNLIF TRBV6-1*01 TRBJ1-2*01 CASSGRQGALYGYTF
0.174073 545.0 545.0 83 83 0.000000 4563 0.10 gex_nbr_vs_tcr_nbr NaN 0 1 TRAV8-3*01 TRAJ15*01 CAVGGNQAGTALIF TRBV4-2*01 TRBJ1-1*01 CASSQKGARGTEAFF
0.183396 545.0 NaN 76 75 0.000000 1102 0.10 gex_nbr_vs_tcr_cluster 482.0 2 2 TRAV13-2*01 TRAJ8*01 CAENTGFQKLVF TRBV20-1*01 TRBJ1-5*01 CSARIGQDQPQHF
0.185694 545.0 NaN 103 102 0.000000 4944 0.10 gex_nbr_vs_tcr_cluster 708.0 0 1 TRAV8-4*01 TRAJ8*01 CAVSDRLGTGFQKLVF TRBV25-1*01 TRBJ1-5*01 CASSDGVSQPQHF
0.213459 545.0 NaN 102 102 0.000000 5021 0.10 gex_nbr_vs_tcr_cluster 708.0 4 1 TRAV8-6*01 TRAJ32*01 CAVTPMGGATNKLIF TRBV5-5*01 TRBJ1-5*01 CASSPRDSRNQPQHF
0.213459 545.0 NaN 102 102 0.000000 5116 0.10 gex_nbr_vs_tcr_cluster 708.0 10 1 TRAV8-6*01 TRAJ9*01 CAVSGGTGGFKTIF TRBV19*01 TRBJ2-7*01 CASRPTSGSLDEQYF
0.213459 545.0 NaN 102 102 0.000000 5032 0.10 gex_nbr_vs_tcr_cluster 708.0 4 1 TRAV8-6*01 TRAJ37*01 CAVLGTGSSNTGKLIF TRBV6-2*01 TRBJ2-7*01 CASRQTLLGEQYF
0.213459 545.0 NaN 102 102 0.000000 4868 0.10 gex_nbr_vs_tcr_cluster 708.0 4 1 TRAV8-4*01 TRAJ43*01 CAVSAYNNNDMRF TRBV6-2*01 TRBJ2-7*01 CASNGGGAGEDEQYF
0.231893 NaN 545.0 96 95 0.000000 2869 0.10 gex_cluster_vs_tcr_nbr 652.0 2 2 TRAV27*01 TRAJ24*01 CAGARTTDSWGKLQF TRBV20-1*01 TRBJ2-2*01 CSASTSGNTGELFF
0.245305 NaN 545.0 86 86 0.000000 1325 0.10 gex_cluster_vs_tcr_nbr 575.0 3 14 TRAV16*01 TRAJ52*01 CALSGRGGGGTSYGKLTF TRBV18*01 TRBJ2-7*01 CASSPPGTEVQYF
0.260769 NaN 545.0 98 94 0.000000 3464 0.10 gex_cluster_vs_tcr_nbr 652.0 2 5 TRAV35*01 TRAJ17*01 CAGQLYRAAGNKLTF TRBV19*01 TRBJ1-2*01 CASSPAPGQGSIYGYTF
0.267568 545.0 NaN 75 71 0.000000 3633 0.10 gex_nbr_vs_tcr_cluster 460.0 6 5 TRAV35*01 TRAJ53*01 CAGRLSGGSNYKLTF TRBV11-2*01 TRBJ1-2*01 CASSLTGNYGYTF
0.270479 545.0 NaN 27 27 0.000000 4453 0.10 gex_nbr_vs_tcr_cluster 122.0 5 12 TRAV8-1*01 TRAJ50*01 CAVNGKTSYDKVIF TRBV20-1*01 TRBJ1-2*01 CSAPIGRGNYGYTF
0.292462 545.0 NaN 118 118 0.000000 4224 0.10 gex_nbr_vs_tcr_cluster 852.0 8 0 TRAV5*01 TRAJ39*01 CAESIHAGNMLTF TRBV11-2*01 TRBJ2-3*01 CASSLERNAAGADTQYF
0.292820 NaN 54.0 14 9 0.000000 3542 0.01 gex_cluster_vs_tcr_nbr 203.0 9 5 TRAV35*01 TRAJ42*01 CAGLNYGGSQGNLIF TRBV6-1*01 TRBJ1-2*01 CASITKDRGFGYTF
0.314597 NaN 54.0 14 9 0.000000 3593 0.01 gex_cluster_vs_tcr_nbr 205.0 9 5 TRAV35*01 TRAJ42*01 CAVMNYGGSQGNLIF TRBV5-1*01 TRBJ1-2*01 CASSAGRGDGYTF
0.314597 NaN 54.0 14 9 0.000000 3533 0.01 gex_cluster_vs_tcr_nbr 205.0 9 5 TRAV35*01 TRAJ42*01 CAGKNYGGSQGNLIF TRBV7-3*01 TRBJ1-2*01 CASSPGPGSPYGYTF
0.314597 NaN 54.0 14 9 0.000000 3587 0.01 gex_cluster_vs_tcr_nbr 205.0 9 5 TRAV35*01 TRAJ42*01 CAGRNYGGSQGNLIF TRBV7-3*01 TRBJ2-3*01 CASSPRHGTDTQYF
0.347869 545.0 NaN 77 70 0.000000 3390 0.10 gex_nbr_vs_tcr_cluster 461.0 9 5 TRAV30*01 TRAJ33*01 CGTALSNYQLIW TRBV9*01 TRBJ2-1*01 CASSLLDLRYNEQFF
0.386918 NaN 54.0 13 9 0.000000 3550 0.01 gex_cluster_vs_tcr_nbr 205.0 9 5 TRAV35*01 TRAJ42*01 CAGLNYGGSQGNLIF TRBV14*01 TRBJ2-5*01 CASSKRQHSPAETQYF
0.410000 NaN 54.0 12 9 0.000000 3572 0.01 gex_cluster_vs_tcr_nbr 201.0 9 5 TRAV35*01 TRAJ42*01 CAGQNYGGSQGNLIF TRBV6-1*01 TRBJ1-2*01 CASFRGGVNGYTF
0.452737 545.0 NaN 75 70 0.000000 2855 0.10 gex_nbr_vs_tcr_cluster 461.0 5 5 TRAV27*01 TRAJ16*01 CAGRFSDGQKLLF TRBV5-1*01 TRBJ2-3*01 CASSPPGGSTDTQYF
0.455230 NaN 545.0 139 138 0.000000 77 0.10 gex_cluster_vs_tcr_nbr 1041.0 0 3 TRAV1-2*01 TRAJ9*01 CAVRETGGFKTIF TRBV3-1*01 TRBJ2-5*01 CASSQASGGRETQYF
0.466759 545.0 NaN 117 117 0.000000 1005 0.10 gex_nbr_vs_tcr_cluster 852.0 8 0 TRAV13-2*01 TRAJ3*01 CAEKMRGSSASKIIF TRBV5-5*01 TRBJ2-3*01 CASSGGGWADTQYF
0.499536 NaN 54.0 11 9 0.000000 3575 0.01 gex_cluster_vs_tcr_nbr 201.0 9 5 TRAV35*01 TRAJ42*01 CAGQNYGGSQGNLIF TRBV6-6*01 TRBJ1-2*01 CASSKRGDYGYTF
0.499536 NaN 54.0 11 9 0.000000 3573 0.01 gex_cluster_vs_tcr_nbr 201.0 9 5 TRAV35*01 TRAJ42*01 CAGQNYGGSQGNLIF TRBV6-2*01 TRBJ1-2*01 CASSPTRGALVGYTF
0.499536 NaN 54.0 11 9 0.000000 3567 0.01 gex_cluster_vs_tcr_nbr 201.0 9 5 TRAV35*01 TRAJ42*01 CAGQNYGGSQGNLIF TRBV11-2*01 TRBJ1-2*01 CASSPSRGSLGGYTF
0.506932 545.0 545.0 85 80 0.000000 3577 0.10 gex_nbr_vs_tcr_nbr NaN 6 5 TRAV35*01 TRAJ42*01 CAGQNYGGSQGNLIF TRBV7-8*01 TRBJ1-2*01 CASSPRQGAINGYTF
0.515636 545.0 NaN 74 70 0.000000 2440 0.10 gex_nbr_vs_tcr_cluster 461.0 5 5 TRAV25*01 TRAJ21*01 CAATYNFNKFYF TRBV6-1*01 TRBJ2-1*01 CASSLTREQFF
0.569272 NaN 545.0 95 93 0.000000 3439 0.10 gex_cluster_vs_tcr_nbr 652.0 2 5 TRAV35*01 TRAJ13*01 CAGQNSGGYQKVTF TRBV27*01 TRBJ1-5*01 CASSLYGYRGFGQPQHF
Omitted 39 lines

graph_vs_graph_logos


This figure summarizes the results of a CoNGA analysis that produces scores (CoNGA) and clusters. At the top are six 2D UMAP projections of clonotypes in the dataset based on GEX similarity (top left three panels) and TCR similarity (top right three panels), colored from left to right by GEX cluster assignment; CoNGA score; joint GEX:TCR cluster assignment for clonotypes with significant CoNGA scores, using a bicolored disk whose left half indicates GEX cluster and whose right half indicates TCR cluster; TCR cluster; CoNGA; GEX:TCR cluster assignments for CoNGA hits, as in the third panel.

Below are two rows of GEX landscape plots colored by (first row, left) expression of selected marker genes, (second row, left) Z-score normalized and GEX-neighborhood averaged expression of the same marker genes, and (both rows, right) TCR sequence features (see CoNGA manuscript Table S3 for TCR feature descriptions).

GEX and TCR sequence features of CoNGA hits in clusters with 5 or more hits are summarized by a series of logo-style visualizations, from left to right: differentially expressed genes (DEGs); TCR sequence logos showing the V and J gene usage and CDR3 sequences for the TCR alpha and beta chains; biased TCR sequence scores, with red indicating elevated scores and blue indicating decreased scores relative to the rest of the dataset (see CoNGA manuscript Table S3 for score definitions); GEX 'logos' for each cluster consisting of a panel of marker genes shown with red disks colored by mean expression and sized according to the fraction of cells expressing the gene (gene names are given above).

DEG and TCRseq sequence logos are scaled by the adjusted P value of the associations, with full logo height requiring a top adjusted P value below 10-6. DEGs with fold-change less than 2 are shown in gray. Each cluster is indicated by a bicolored disk colored according to GEX cluster (left half) and TCR cluster (right half). The two numbers above each disk show the number of hits within the cluster (on the left) and the total number of cells in those clonotypes (on the right). The dendrogram at the left shows similarity relationships among the clusters based on connections in the GEX and TCR neighbor graphs.

The choice of which marker genes to use for the GEX umap panels and for the cluster GEX logos can be configured using run_conga.py command line flags or arguments to the conga.plotting.make_logo_plots function.
Image source: ./CoNGA.output_graph_vs_graph_logos.png